Virus Evolution
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
BackgroundRapid emergence and replacement of SARS-CoV-2 variants underscore the need for early and reliable indicators of variant dominance to guide timely public health response. However, early genomic trajectories are typically short, sparse, and noisy, with strong fluctuations and substantial cross-country heterogeneity in sequencing intensity and reporting. MethodsWe develop a scalable forecasting framework that predicts whether new variants will reach high prevalence and how long they will...
Show abstract
The emergence of variants has shaped the COVID-19 pandemic. The lack of directly observed precursors to these variants has led to proposals that variants emerge from either persistent infections, transmission in non-human animal populations after reverse-zoonosis, or cryptic transmission in the human population. We investigated the origin of variants by analyzing the molecular clock and rate of nonsynonymous and synonymous substitutions in SARS-CoV-2 circulating in human population, persistently...
Show abstract
BackgroundRespiratory syncytial virus (RSV) is a leading cause of severe acute respiratory infection in infants, young children and vulnerable adults. Despite implications for designing interventions, our understanding of RSV infection/reinfection patterns during community outbreaks is incomplete. MethodsTo characterize respiratory virus infections regardless of symptom status, we performed a prospective cohort surveillance in coastal Kenya from August 2023 to August 2024. Nasopharyngeal/oropha...
Show abstract
SARS-CoV-2 mutations play a key role in viral evolution, in immune escape, and potentially in disease severity. However, the clinical impact of most mutations remains poorly understood, particularly across different variants. A historical observational study was conducted using SARS-CoV-2 whole-genome sequencing data linked to clinical metadata from 175,503 COVID-19 cases in Israel. The dataset was stratified into four variant-specific periods: B.1.1.7, B.1.617.2, BA.1, and BA.2. Logistic regres...
Show abstract
Reunion island just experienced a massive chikungunya virus outbreak in 2024-2025, with more than 54,000 confirmed cases. This is the second major chikungunya outbreak on the island, following the first one that peaked 20 years ago. It has been assessed that this new outbreak finds its origin in a single introduction event into the island, offering a unique opportunity to exploit viral genomic data to understand the epidemiological and dispersal dynamics of the introduced transmission chain. We ...
Show abstract
Accurate identification of unknown pathogens is critical for medicine and public health, yet current metagenomic workflows remain heavily dependent on specialized bioinformatics expertise and manual interpretation, creating substantial bottlenecks in time-sensitive diagnostic settings1. The key challenges lie in achieving precise species identification amidst high background noise and translating complex microbial data into clinically actionable insights2,3. Here we present the Global Pathogen A...
Show abstract
In November 2024, a highly divergent BA.3-related SARS-CoV-2 lineage, designated BA.3.2, was detected in South Africa, marking the first appearance of a BA.3-derived lineage in over two years. Phylogenetic reconstruction places BA.3.2 on an extended branch descending from ancestral BA.3, with no intermediate genomes detected, consistent with a prolonged period of unsampled or isolated evolution. Molecular clock analyses indicate accelerated divergence characteristic of a saltation event, while p...
Show abstract
In nonsuppressible HIV viremia (NSV), individuals have persistently detectable viral load despite adherence to [≥]2 fully active antiretroviral drugs. NSV represents an area of clinical uncertainty and an opportunity to understand the mechanisms of HIV persistence. We performed in-depth virologic characterization to identify distinct NSV phenotypes. We categorized participants into those who had persistent viremia after antiretroviral therapy (ART) initiation (primary NSV) and those who had N...
Show abstract
Herpes simplex virus (HSV) is an endemic pathogen, infecting most adults world-wide. HSV infection can cause a wide spectrum of disease outcomes, ranging from asymptomatic infection or mild lesions to rare cases of infectious keratitis, encephalitis, and death. HSV genome sequences have been shown to differ between individual patients, as well as within individuals. To date, the vast majority of publicly available HSV genomic data has come from Europe and North America. Our current understanding...
Show abstract
Influenza A(H3N2) subclade K virus was detected in Canada early in the 2025/26 influenza season, bearing an antigenic transition in the hemagglutinin (HA) glycoprotein. Analysis of 396 HA sequences from Canada showed antigenic divergence from 2025/26 influenza vaccine strains, consistent with partial mismatch. Phylodynamic analysis revealed sustained pre-vaccine transmission without clear post-vaccine expansion. Phylogenetic and phylogeographic analyses indicated interprovincial mixing within a ...
Show abstract
Clonal hematopoiesis (CH), defined by the expansion of hematopoietic cells with somatic mutations in leukemogenic genes (CH of indeterminate potential (CHIP)) or with mosaic chromosomal alterations (mCAs), is associated with aging and adverse health outcomes in the general population. CHIP prevalence has been shown to be higher in People with HIV (PWH) than in controls. However, the full spectrum, prevalence, and clinical consequences of CH in PWH remain incompletely understood. Here, we provide...
Show abstract
Strengthening in-country sequencing capacity generated 28 Lassa virus genomes from human clinical cases, expanding our knowledge of Lassa fever in Guinea. Phylogeographic analysis revealed cross-border exchange between Liberia and the NZerekore region, and a Sierra Leone introduction into the Gueckedou area. Enhanced genomic surveillance is crucial to guide future public health actions.
Show abstract
The recent SARS-CoV-2 pandemic has highlighted the growing importance of infectious disease analysis. An accurate and robust model can empower public health leaders to make timely decisions on social distancing and vaccination policies, thereby reducing the number of cases, hospitalizations and deaths. However, the emergence of new variants and subvariants can significantly alter the transmissibility, immune escape capacity and virulence of the pathogen in a short time, making the number of case...
Show abstract
Several African countries experienced a surge in acute hemorrhagic conjunctivitis (AHC) cases in 2024. Investigations in Kenya, Mayotte and Tanzania identified coxsackievirus A24 variant (CV-A24v) as the causative agent. To date, limited genomic data exist to elucidate the sources, epidemiology, and evolution of CV-A24v in Africa. We generated 245 CV-A24v genomes from samples collected between January and September 2024 in coastal Kenya. Phylogenetic analysis showed that these viruses belonged t...
Show abstract
Human metapneumovirus (HMPV) is a significant cause of acute respiratory illness in both children and adults, yet its genomic epidemiology remains understudied compared to other respiratory viruses. Here, we report an expanded genomic surveillance of HMPV in the United States, utilizing 325 newly sequenced samples from two cohorts: the household-based HIVE study (2010-2022) in Michigan and the multicenter IVY network (2022-2025) of hospitalized adults. Our analyses revealed the continued predomi...
Show abstract
ObjectiveWest Nile virus lineage 2 (WNV-2) is a growing public health concern in Europe causing West Nile fever or West Nile neuroinvasive disease (WNND) with substantial morbidity and mortality; however, genomic data from Southern Italy are limited despite recent expansion of autochthonous transmission. The aim of the study was to characterise the phylogenetic and molecular features of the WNV-2 strain responsible for the first autochthonous human infection reported in Calabria (2023), and two ...
Show abstract
Bacterial infections are a major cause of morbidity and mortality among children under five in low- and middle-income countries (LMICs). Children in LMICs are exposed to and colonized by a range of pathogenic bacteria, yet patterns of bacterial exchange between humans are not well known, in part because culturing and sequencing single bacterial isolates is labor-intensive. Here, we apply a machine learning strain tracking approach to metagenomic data from 511 stool samples from children and moth...
Show abstract
The emergence of vaccine covered serotypes causing invasive pneumococcal disease (IPD) is a serious concern worldwide. We investigated the unexpected rise of serotype 4 causing IPD primarily in non-vaccinated young adults after the COVID-19 pandemic that further spread to adults [≥] 65 years in recent years. For this purpose, we conducted a retrospective study of serotype 4 IPD cases (n=827) reported in Spain between 2009 and 2024. Whole-genome sequencing was performed to assess clonal lineag...
Show abstract
Biological fitness quantifies the efficiency and selective advantage of pathogens and hosts in their bilateral interaction. Key questions--such as how much more infectious an emerging variant is compared with its predecessor, or how much protection vaccination offers relative to no vaccination--require fitness to be measured systematically, in real time, and ideally beyond controlled laboratory settings. We propose an approach that infers biological fitness from mostly non-biological data on inf...
Show abstract
BackgroundThe hyperpolymorphic nature and structural complexity of the human leukocyte antigen (HLA) genomic region present challenges for accurate and scalable typing across diverse sample types. While wholegenome sequencing (WGS) offers the opportunity to infer HLA genotypes without targeted enrichment, systematic benchmarks across sequencing platforms, biospecimens and coverage levels remain limited. ResultsWe assembled a multi-platform resource of WGS datasets derived from short-read (Illum...